--- layout: page title: Data Science Master's Thesis htmlwidgets: TRUE permalink: /predictive-analytics-thesis/ ---

Welcome to my spot on the web for drafts, supplemental material, and general thoughts about doing a thesis project for the Master of Science in Predictive Analytics degree (now the Master's in Data Science (MSDS) program) from Northwestern University. Below the interactive plots, I'm developing a sort of "epilogue" containing thoughts about doing a data science Master's, choosing the thesis option, and some of the things I've learned along the way.

Thesis Paper

I'll update this section with drafts as they get finished.

2018-11-04: I have a (mostly) completed draft you can check out here on Google Drive. I'm currently awaiting comments from readers so no doubt it will change substantially. I haven't put in a Table of Contents and I'm still figuring out how to list the supplemental materials you'll find on this page in but everything else is there (hooray!).

2018-12-16: A lot has changed in the last month or so! I've decided to push back my tentative graduation date from this month to the end of the 2019 Winter Quarter, in part due to starting a new position as a data scientist at Highmark Health here in Pittsburgh. I had the thesis draft reviewed by my first reader who suggested some restructuring for the Conclusions section but otherwise found it to be good.

I spent a few weeks away from the thesis which allowed me to come back to it with a fresh set of eyes. I made some grammar edits and added the Table of Contents as well as the Appendix listing the supplemental material (links to the Github repo and this webpage). The most recent version is v.4.0 which can be accessed here. This is a completely formatted draft with all the necessary components as outlined in the Graduate Thesis Handbook.

I'm happy to have some time to finish the process in a way that isn't rushed. I'll be working over the holidays to restructure the Conclusions section and hope to get notes from a second reader by the end of January. Barring any substantial unforeseen issues, I should have everything done by the March 15th deadline to graduate at the end of the Winter 2019 quarter (hooray!).

Code

All the code (mostly in R) for the thesis can be found in the project repo on GitHub.

Supplemental Material

Interactive Multidimensional Scaling Plots

Below are four interactive multidimensional scaling plots of genetic profiles developed from open-source RNA-seq data available from the Aging, Dementia, and TBI Study from the Allen Brain Science Institute.

Use your mouse to grab them, rotate them, and zoom in and out. Hovering over a data point gives the point's coordinates in the first three MDS dimensions. Each point represents a genetic profile (based on expression levels for 50,000+ genes and gene isoforms) for an individual patient/donor.

These were made using Plotly and htmlwidgets for R. Check out this blog post for more on multidimensional scaling of gene expression level data.

Shaded by Brain Region

HIP = hippocampus
FWM = forebrain white matter
PCx = parietal cortex
TCx = temporal cortex

plotly

Shaded by Donor Sex

plotly

Shaded by Lifetime Number of Traumatic Brain Injuries (TBIs)

plotly

Shaded by Dementia Status

plotly

Differential Expression Analysis Filtering & p-Value Cutoff Experiments

A comparison of the numbers of "significant" genes obtained with different filtering parameters and p-value cutoffs for determining differential expression in donors with dementia.

Filtering & P-Value Cutoff Experiment Spreadsheet

Brain Region Intersection Gene Details

As a part of the exploratory analysis of the RNA-seq transcriptome data, I investigated the 29 genes that had altered expression patterns in all four brain regions sampled from donors with dementia (hippocampus, forebrain white matter, parietal cortex, or temporal cortex).

Brain Region Intersection Gene Details

Epilogue

Things I've Learned by Doing a Data Science Master's Thesis

As things start to wrap up for me, I'm finding myself reflecting on the entire experience of doing the MSPA program. Maybe you stumbled onto this page beacuse you're thinking of pursuing a data science Master's degree. Or maybe you're already in the MSDS program at Northwestern or somewhere else and are trying to make the "thesis or capstone" decision. In this section, I'll be keeping a list of some of the things I've learned from doing this degree with a focus on doing a thesis project. Just my $0.02. FWIW, etc. I'm putting it down here as a sort of epilogue to the thesis once she's all done.

“Life can only be understood backwards; but it must be lived forwards.” - Kierkegaard